Suspicions of two bridgehead invasions of Xylella fastidiosa subsp. multiplex in France

Of American origin, a wide diversity of Xylella fastidiosa strains belonging to different subspecies have been reported in Europe since 2013 and its discovery in Italian olive groves. Strains from the subspecies multiplex (ST6 and ST7) were first identified in France in 2015 in urban and natural areas. To trace back the most probable scenario of introduction in France, the molecular evolution rate of this subspecies was estimated at 3.2165 × 10-7 substitutions per site per year, based on heterochronous genome sequences collected worldwide. This rate allowed the dating of the divergence between French and American strains in 1987 for ST6 and in 1971 for ST7. The development of a new VNTR-13 scheme allowed tracing the spread of the bacterium in France, hypothesizing an American origin. Our results suggest that both sequence types were initially introduced and spread in Provence-Alpes-Côte d’Azur (PACA); then they were introduced in Corsica in two waves from the PACA bridgehead populations.


DNA extraction and whole-genome sequencing 26
Fifty-five strains were sequenced in this study using Illumina HiSeq X 4000 or PacBio technology 27 (Supplementary data 1). For the 48 strains sequenced by Illumina HiSeq X, highly concentrated 28 bacterial suspensions were prepared from fresh cultures in 5 mL of ultrapure sterile water. The 29 genomic DNA was extracted using the Wizard® Genomic DNA Purification Kit (Promega), 30 following the manufacturer's recommendations. DNA was rehydrated in 100 µL of ultrapure 31 sterile water. Quality and quantity of extracted DNA were verified using a NanoDrop ND-100 32 Spectrophotometer (Thermofisher) and a Qubit Fluorometer 1.0 (Invitrogen

Scenarios 104
The bottom-up approach consisted in searching for the most probable topology for each 105 population combination of the American ancestor and two French populations in presence or not 106 of an unsampled (ghost) population. Altogether this lead to three independent analyses and in each 107 analysis, 10 to 12 topologies were tested (Fig. S9 A). A total of 30 scenarios (combination of 108 population choice and topology) were analyzed for ST6 and 36 for ST7 in this bottom-up approach. 109 Topologies 1, 5, 9, 10, 11 and 12 assumed that the French populations were introduced separately 110 from two events. Topologies 2 and 6 hypothesizes that samples which belonged to the same 111 population may have been mistakenly separated into two populations. Topologies 3, 4, 7 and 8 112 assumed that a first population was introduced in France and that a bridgehead invasion was 113 responsible for the emergence of the second French populations. Topologies 5, 6, 7, 8, 10, 11 and 114 12 assumed that one or two populations may have been unsampled (named ghost population). 115 The top-down approach consisted in searching for the most probable population combinations 116 for each possible topology. A total of 5 topologies were tested for each of the three French ST6 117 and ST7 populations (six combinations) (Fig. S9 B). The topologies differed by the number of 118 independent introductions and divergence events. However, based on the results of the bottom-up 119 approach, a number of scenario were considered statistically impossible and were not tested in the 120 top-down approach. 121 In consequence, from the six possible population combinations all were tested for ST6, for a total 122 of 30 scenarios (Fig. S10) and only two topologies for ST7 for a total of 12 scenarios (Fig. S11). 123 Class 1 topology (I) assumed three independent introductions in France from a source population. 124 Class 2 topology (II) assumed that one population was introduced in France and is responsible for 125 two independent bridgehead invasions. Class 3 topologies (III.I and III.II) assumed that two 126 independent introductions occurred and one of them was responsible for a bridgehead invasion of 127 the third population. Class 4 topology (IV) assumed that one population was first introduced and 128 was responsible for two successive bridgehead invasions. 129

Choice and validation of priors 130
For the DiyABC analyses, parameters were drawn in the prior distribution described in 131 Supplementary data 8. Due to the lack of knowledge concerning the bacterium population biology 132 (population effective sizes, date of founder event, existence and duration of a bottleneck, for 133 example), a log uniform distribution with large ranges was chosen. Parameters for microsatellite 134 mutation models were set by default following a stepwise mutation model 8 . Other parameters of 135 the mutation models were set by default at "0" and all summary statistics were selected. 136

Simulations and posterior probability estimation 142
A total of 106 simulations were performed for each competing scenario using DiyABC software 9 . 143 Because the DiyABC lack of discriminative power among competing scenarios, abcrf were used 144 to estimate best scenarios or the rejected one, using the R package "abcrf" 10 . Following Pudlo et 145 al., 2016 abcrf treatments were processed on reference tables including 104 simulations per 146 scenario 10 . In the constructed random-forests the number of trees was fixed at n=1000. For each 147 abcrf analysis the best scenario, its prior error rates and its posterior probability were evaluated 148 over 10 replicate runs of the same reference interest, and consistent with SNP data, strains belonging to ST6 were scattered into two clusters, 177 one grouping French and American ST6 strains, and the other grouping ST6 strains from Spain 178 (Fig. S5). The MLVA accurately resolved the different haplotypes in our international strain 179 collection as the haplotype accumulation curve reached a plateau with more than 95% of the 180 haplotypes detected with nine or more markers (Fig. S6).  (Table 1); ii) selecting a highly efficient DNA polymerase, meaning the least affected by PCR 187 inhibitors. Platinum™ Taq DNA Polymerase (ThermoFisher) was selected as it allowed 188 amplification of some VNTR loci (ASSR-9, ASSR-16, GSSR-4) in bacterial suspensions at ten 189 times lower concentrations than GoTaq® G2 Flexi DNA Polymerase (i.e. concentrations as low 190 as 10 3 to 10 4 CFU.mL -1 ); and iii) setting sample DNA aliquot volume to 2 µL per 20 µL-total 191 volume reaction to maximize target presence and amplification, as no inhibitory effect was noticed 192 (data not shown). 193 The MLVA proved to have an excellent ability to genotype X. fastidiosa from isolated strains as 194 well as from DNA extracted from plant sample.

Population structure of the French X. fastidiosa subsp. multiplex strains 203
Because DAPC and STRUCTURE analyses led to almost identical groupings (Fig. 3, Fig. S8 S9. The scenarios compared using ABC used independently for ST6 and ST7 samples. A) The 12 scenarios compared by the bottom-up approach. All scenarios considered that the French populations (Pop1 and Pop2) derived from an initial USA population. Ghost populations, i.e. populations that would have existed but were not sampled, were used in scenario 5-8 and 10-12. The time on the Y axis refers to the divergence date between populations. B) The five topologies analyzed by the top-down approach. All scenarios considered that the French populations (Pop1, Pop2 and Pop3) derived from an initial USA population. The time on the Y axis refers to the divergence date between populations.
Page 17 of 21 Fig. S10. List and visual representation of the 30 scenarios analyzed by ABC for ST6 samples. All scenarios considered that the French populations (C1P1, C2, and P2) derived from an initial USA population. The time on the Y axis refers to the divergence date between populations. Scenario II.7 (boxed in red) is the one defined by Abcrf analysis as the most probable.
Page 18 of 21 Fig. S11. List and visual representation of the 12 scenarios analyzed by ABC for ST7 samples. All scenarios considered that the French populations (C1P1, C2, and P2) derived from an initial USA population. The time on the Y axis refers to the divergence date between populations. Scenario II.7 (boxed in red) is the one defined by Abcrf analysis as the most probable.